Rational and Convergent Learning in Stochastic Games
نویسندگان
چکیده
This paper investigates the problem of policy learning in multiagent environments using the stochastic game framework, which we briefly overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that they fail to simultaneously meet both criteria. We then contribute a new learning algorithm, WoLF policy hillclimbing, that is based on a simple principle: “learn quickly while losing, slowly while winning.” The algorithm is proven to be rational and we present empirical results for a number of stochastic games showing the algorithm converges.
منابع مشابه
MULTI-AGENT SYSTEMS MULTI-AGENT SYSTEMS MULTI-AGENT GAMES Rational and Convergent Learning in Stochastic Games
This paper investigates the problem of policy learn-ing in multiagent environments using the stochasticgame framework, which we briefly overview. Weintroduce two properties as desirable for a learningagent when in the presence of other learning agents,namely rationality and convergence. We examineexisting reinforcement learning algorithms accord-ing to these two prop...
متن کاملBalancing Two-Player Stochastic Games with Soft Q-Learning
Within the context of video games the notion of perfectly rational agents can be undesirable as it leads to uninteresting situations, where humans face tough adversarial decision makers. Current frameworks for stochastic games and reinforcement learning prohibit tuneable strategies as they seek optimal performance. In this paper, we enable such tuneable behaviour by generalising soft Q-learning...
متن کاملRational Learning in Imperfect Monitoring Games
This paper provides a general framework to analyze rational learning in strategic situations where the players have private priors, private information and there is a role for passive and active learning. The theory of statistical inference for stochastic processes and of Markovian dynamic programming is applied to study players asymptotic behavior in the context of repeated and of recurring ga...
متن کاملCoco-Q: Learning in Stochastic Games with Side Payments
Coco (“cooperative/competitive”) values are a solution concept for two-player normalform games with transferable utility, when binding agreements and side payments between players are possible. In this paper, we show that coco values can also be defined for stochastic games and can be learned using a simple variant of Q-learning that is provably convergent. We provide a set of examples showing ...
متن کاملOnline Learning in Stochastic Games and Markov Decision Processes
In their standard formulations, stochastic games and Markov decision processes assume a rational opponent or a stationary environment. Online learning algorithms can adapt to arbitrary opponents and non-stationary environments, but do not incorporate the dynamic structure of stochastic games or Markov decision processes. We survey recent approaches that apply online learning to dynamic environm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001